Categories

Versions

Generate n-Grams (Characters) (Text Processing)

Synopsis

Creates character n-Grams of each token in a document.

Description

This operator creates all possible n-Grams of each token in a document. A character n-Gram is defined as a series of characters of length n. The n-Grams of a token generated by this operator consist of all series of characters of this token which have length n. If a token is shorter than the specified length n, the token itself is kept in the resulting document.

Input

  • document

    The document port.

Output

  • document

    The document port.

Parameters

  • lengthThe length n of the n-grams.
  • keep termsIndicates if the original terms (i.e. tokens) should be kept along with the created n-grams.